## [1] "/Users/haneen/Documents/DA/p5 "
## [1] 1599 13
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
##
## 8.4 8.5 8.7 8.8 9 9.05
## 2 1 2 2 30 1
## 9.1 9.2 9.233333333 9.25 9.3 9.4
## 23 72 1 1 59 103
## 9.5 9.55 9.566666667 9.6 9.7 9.8
## 139 2 1 59 54 78
## 9.9 9.95 10 10.03333333 10.1 10.2
## 49 1 67 2 47 46
## 10.3 10.4 10.5 10.55 10.6 10.7
## 33 41 67 2 28 27
## 10.75 10.8 10.9 11 11.06666667 11.1
## 1 42 49 59 1 27
## 11.2 11.3 11.4 11.5 11.6 11.7
## 36 32 32 30 15 23
## 11.8 11.9 11.95 12 12.1 12.2
## 29 20 1 21 13 12
## 12.3 12.4 12.5 12.6 12.7 12.8
## 12 13 21 6 9 17
## 12.9 13 13.1 13.2 13.3 13.4
## 9 6 2 1 3 3
## 13.5 13.56666667 13.6 14 14.9
## 1 1 4 7 1
## Observations: 1,599
## Variables: 13
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
## $ fixed.acidity <dbl> 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, ...
## $ volatile.acidity <dbl> 0.700, 0.880, 0.760, 0.280, 0.700, 0.660,...
## $ citric.acid <dbl> 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06,...
## $ residual.sugar <dbl> 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2...
## $ chlorides <dbl> 0.076, 0.098, 0.092, 0.075, 0.076, 0.075,...
## $ free.sulfur.dioxide <dbl> 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15...
## $ total.sulfur.dioxide <dbl> 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, ...
## $ density <dbl> 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0...
## $ pH <dbl> 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30,...
## $ sulphates <dbl> 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46,...
## $ alcohol <dbl> 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, ...
## $ quality <int> 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5,...
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
This report explores the red wine quality dataset which contains 1599 observation and 13 variables. The aim of the project is to represent the effects of Acids, total sulfur dioxide, free sulfur dioxide, chlorides, pH, density, sulphates, alcohol, and residual sugar on wine quality by reviewing their relationships and understanding their structure using R programming language in RStudio.
## 3 4 5 6 7 8
## 10 53 681 638 199 18
Quality distribution shows the most of red wines quality between 5 and 6.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
alcohol distribution is right skewed and it has one peak at approximately between 9 and 10.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
fixed acidity graph is a positively distribution and have two peaks at 7 and 8 with a few outliers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
volatile acidity graph seems normal distribution with an outliers, most wines contain less than 0.8 g/liter
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
The graph shows the most of wines contains 0.5 g /L or less of citric acid and two peaks at 0 and around 0.46 with a few outliers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
residual sugar graph is right skewed with high peak at around 0.2 with high outliers and the mean is between median and 3rd Qu.
## [1] 0.85
## [1] 3.65
The distribution is normal distribution after remove outliers
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
free sulfur dioxide graph is right skewed with high peak around 4 to 6 75% of wines contain more than 20 g/liter of free sulfur dioxide.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
total sulfur dioxide distribution is similar to free sulfur dioxide with high outliers.
fixed outlier using log10 function it seems more normal distribution.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
Chlorides distribution is a right skewed and most wines contains less than 0.2 g/Liter of chlorides, with a very high outliers.
## [1] 0.04
## [1] 0.12
After fixing outliers chlorides distribution seems to symmetric distribution.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0037
Density diagram has a normal distribution with mean = 0.9967 and median = 0.9968 which is close together.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
pH distribution also normal looks like density distribution most of wines contain rate of pH between 3 to 3.5 and the mean and median too close.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
sulphates graph is right skewed distribution with median = 0.6200, and a high peak around 0.563
What is the structure of your dataset?
What is/are the main feature(s) of interest in your dataset?
What other features in the dataset do you think will help support your investigation into your feature(s) of interest?
Did you create any new variables from existing variables in the dataset?
Of the features you investigated, were there any unusual distributions?
First of all, This is a matrix plot to look at relationships between the variables by correlation value.
Dark blue is indicates to strong positive correlation light blue is a weak positive correlation. the same with orange, dark orange is strong negative correlation and light orange to a weak negative correlation.
The observations from matrix graph is:
The correlation between quality and alcohol is a positive. The amount of alcohol affects the level of wine quality. The higher the alcohol, the quality is high.
Acids effect on wine quality: volatile acidity is negative correlations and positive correlations with citric acid.
Density distribution in all quality degrees is similar it seems does not affect.
volatile acidity and pH are strong negative correlation with citric acid.
Fixed acidity with citric acid is a strong positive correlation.
pH with fixed acidity is strong negative correlation, and density with fixed acidity is strong positive correlation.
Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?
Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?
What was the strongest relationship you found?
High alcohol and sulphates produced high quality wine
This graph show the correlation between alcohol and density is negative. quality in 5 degree has the higher density and lower alcohol. In general most high qulaity wines have density high and alcohol low.
Low sulfer produced high quality of wine.
high quality wines is high Alcohol and low pH.
Talk about some of the relationships you observed in this part of the
- High in alcohol and low in density, sulfur dioxide, and pH is indicates to high-quality wine.
Were there any interesting or surprising interactions between features?
Obviously, alcohol is the most ingredient influence the quality of the wines, a high amount of alcohol means high-quality wine, the most wines in the dataset contain about 12 to 14 g/liter of alcohol.
It is positive correlation between alcohol, sulphates, and quality, high alcohol and sulphates indicate to high quality wine.
Sulphates is a harmless ingredient contrary to popular belief.
Alcohol and Total Sulfur Dioxide have an inverse relationship on the quality of the wines, a high amount of alcohol and small amount of Sulfur means high-quality wine. that makes sense because of Sulfur damage to human health.
The red wine quality dataset contains information on almost 1599 variety of wines across 12 variables. I started by understanding the relationships the chemical elements and wines. First I represented the data in univariate plots which is making it easier to understand and visualize data for each element, I found the quality divider to segments from 3 to 8 and the most quality of wines in the dataset between 5 and 6. Alcohol was the high influencer on the quality, As for the acids their distribution was normal, but the interesting was most data contains 0 of Citric Acid. For the residual sugar and chlorides contains in small amounts and the rest of elements their distribution was normal. Then I represented the data for two variables in bivariate plots, to visualize the relationships especially with quality. After that, I represented data for two variables or more.
of the observations which aroused my interest was a high amount of total sulfur means that an increase in the quality of wines, according to my information is a harmful element, but after searching I found it was just myths there is no truth to it.
I struggled to understand the relationships between chemicals and wines, so I searched a lot in google about wine making, this process took a long time.
In future work, I would like to compare white wine with red wine to discover which best, also I would like to add a new variable ‘price’ it will be interesting.
Reference:
https://en.wikipedia.org/wiki/Acids_in_wine https://en.wikipedia.org/wiki/Sweetness_of_wine#Residual_sugar https://www.dummies.com/food-drink/drinks/wine/how-to-discern-wine-quality/ https://vinepair.com/articles/chemical-compounds-wine-taste-smell/ https://www.youtube.com/watch?v=jxUiIFj2l-s https://vinepair.com/?s=wine+citric+acid&submit=Search https://www.thekitchn.com/the-truth-about-sulfites-in-wine-myths-of-red-wine-headaches-100878